Medical Image Analysis — Latest Matching Preprints

1

CorSeg-CineSAX: An Open-Source Deep Learning Framework for Fully Automatic Segmentation of Short-Axis Cine Cardiac MRI Across Multiple Cardiac Diseases

Xu, R.; Jiang, S.; Zhai, Y.; Chen, Y.

2026-04-03 cardiovascular medicine 10.64898/2026.04.01.26349955 medRxiv

Top 0.1%

52.3%

Show abstract

Background: Segmentation of the left ventricular myocardium, left ventricular cavity, and right ventricular cavity on short-axis cine cardiac magnetic resonance (CMR) images is essential for quantifying cardiac structure and function. However, existing automated segmentation tools are limited by small training datasets, narrow disease coverage, restrictive input format requirements, and the absence of anatomical plausibility constraints, hindering their clinical adoption. Methods: We constructed the largest annotated CMR short-axis segmentation dataset to date, comprising 1,555 subjects from 12 centers with five cardiac disease types and full cardiac cycle annotations totaling 319,175 labeled images. A MedNeXt-L model was trained using a 2D slice-by-slice strategy with full field-of-view input, eliminating dependencies on 3D volumes, temporal sequences, or region-of-interest(ROI) localization. A deterministic three-step post-processing pipeline was designed to enforce anatomical priors: connected component constraint, containment relationship constraint, and gap-filling constraint. The model was validated on an internal test set (310 subjects) and three independent public external datasets (ACDC, M&Ms1, M and Ms2; 855 subjects from 6 additional centers across 3 countries), spanning 15 cardiac disease categories-10 of which were never encountered during training. Results: The model achieved mean Dice similarity coefficients (DSC) of 0.913 {+/-} 0.037 and 0.911 {+/-} 0.040 on internal and external test sets, respectively, with a cross-domain performance gap of only 0.002. Post-processing eliminated all containment violations (7.5% [->] 0%) and gap errors (1.8% [->] 0%) while reducing fragment rates by 85.5% (9.0% [->] 1.3%). Zero-shot generalization to 10 unseen disease categories yielded DSC values ranging from 0.899 to 0.921. Automated clinical functional parameters demonstrated excellent agreement with manual measurements for left ventricular indices and right ventricular volumes (intraclass correlation coefficients [≥] 0.977). Conclusions: CorSeg-CineSAX provides a robust, open-source framework for fully automatic CMR short-axis segmentation across diverse clinical scenarios. All source code and pre-trained weights are publicly available at https://github.com/RunhaoXu2003/CorSeg.

2

Differential Network-Based Causal Graph Learning for Cardiovascular Recurrence Risk Prediction and Factor Discovery

Zhou, M.; Zhang, M.; Wang, J.; Shao, C.; Yan, G.

2026-03-18 cardiovascular medicine 10.64898/2026.03.16.26348547 medRxiv

Top 0.1%

43.4%

Show abstract

Cardiovascular disease is one of the leading causes of death worldwide, with myocardial infarction (MI) being a major cause of both morbidity and mortality among cardiovascular patients. MI Patients face a higher risk of cardiovascular disease recurrence afterwards. Therefore, accurately predicting the risk of recurrence and identifying key risk factors are crucial for clinical decision-making. In this paper, we consider the interrelationships among cardiovascular factors from a systemic perspective. We first construct a differential network for each patient to capture individual-specific deviations in factor relationships and propose a novel method, termed Causal Factor-aware Graph Neural Network (CFGNN), which integrates factor interactions to predict the recurrence risk of MI patients while uncovering key risk factors from a causal perspective. Experimental results demonstrate that CFGNN performs well on hospital-derived datasets in real world, effectively identifying several key risk factors. This method not only deepens our understanding of cardiovascular disease, but also paves the way for more targeted and effective interventions.

3

Quantum-Refined Latent Diffusion: A Hybrid Generative Framework for Imbalanced ECG Classification

Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.

2026-04-13 cardiovascular medicine 10.64898/2026.04.09.26350502 medRxiv

Top 0.1%

40.3%

Show abstract

Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.

4

CARDIAC-FM: A Multimodal Foundation Model for Cardiovascular Risk Prediction Using ECG and Cardiac MRI

Li, F.; Li, S.; Qian, Y.; Chen, B.; Brody, J. A.; Yogeswaran, V.; Wiggins, K. L.; Sitlani, C. M.; Bis, J. C.; Shojaie, A.; Longstreth, W. T.; Psaty, B. M.; Tison, G. H.; Du, S.; Floyd, J. S.; Ye, T.

2026-03-18 cardiovascular medicine 10.64898/2026.03.16.26348526 medRxiv

Top 0.1%

33.6%

Show abstract

Atrial fibrillation and heart failure impose substantial health burdens worldwide, yet existing prediction models lack sufficient accuracy and generalizability. We developed CARDIAC-FM, a multimodal foundation model that learns joint representations of 12-lead electrocardiogram (ECG) and cardiac magnetic resonance imaging (MRI) through contrastive learning. We trained CARDIAC-FM on 57,609 paired ECG-cardiac MRI samples from UK Biobank and evaluated it in two external cohorts: the Cardiovascular Health Study (CHS) and the Multi-Ethnic Study of Atherosclerosis (MESA). CARDIAC-FM consistently outperformed unimodal models across all cohorts, and jointly incorporating ECG features with established clinical risk scores yielded additive gains in discrimination, indicating that ECG and traditional risk factors capture complementary dimensions of cardiovascular risk. The learned representations improved prediction across a range of cardiovascular outcomes with minimal task-specific fine-tuning, reflecting real-world settings where many diseases have limited positive samples and lack dedicated risk models. Although trained on paired ECG and MRI data, CARDIAC-FM generates predictions using ECG alone or ECG combined with established risk scores, enabling broad clinical deployment without MRI. These findings demonstrate the promise of multimodal pre-training for generalizable cardiovascular risk prediction.

5

Hierarchical Barycentric Multimodal Representation Learning for Medical Image Analysis

Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.

2026-04-06 neurology 10.64898/2026.04.05.26350202 medRxiv

Top 0.1%

33.6%

Show abstract

Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.

6

Robust MR-AIV: A Systematic Study of Robustness Improvement and Sensitivity Analysis of MR-AIV

Vaezi, M.; Diego Toscano, J.; Guo, Y.; Stefan Gomolka, R.; Em. Karniadakis, G.; H. Kelley, D.; A. S. Boster, K.

2026-04-17 neuroscience 10.64898/2026.04.14.718498 medRxiv

Top 0.1%

23.0%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWCerebrospinal and interstitial fluid transport play a central role in brain metabolic waste clearance, yet non-invasive quantification of deep-brain flow dynamics remains challenging. Magnetic Resonance Artificial Intelligence Velocimetry (MR-AIV) is a physics-informed neural network framework that infers three-dimensional velocity, pressure, and permeability fields from dynamic contrastenhanced MRI by embedding porous-media flow physics into the learning process. Here, we present a methodological refinement and systematic evaluation of MR-AIV. We introduce a universal, anatomically informed, region-of-interest-based permeability initialization that improves anatomical alignment and physical consistency across subjects. We quantify the sensitivity of inferred fields to key modelling choices, including initialization strategies, permeability bounds, diffusivity assumptions, signal-concentration relationships, and measurement noise. Across these conditions, MR-AIV yields stable velocity and permeability estimates with preserved spatial structure. Together, these results establish practical guidelines and identify stable operating regimes for reliable deployment of MR-AIV. By improving robustness and reproducibility, this work strengthens MR-AIV as a minimally invasive approach for mapping brain-wide porous fluid transport and supports its application to studies of neurological health and disease.

7

Encoder-based Curvature-Aware Regularization for estimating asymmetric fiber orientation distribution functions in diffusion MRI

Taherkhani, M.; Pizzolato, M.; Morup, M.; Dyrby, T. B.

2026-04-02 neuroscience 10.64898/2026.03.31.715534 medRxiv

Top 0.1%

23.0%

Show abstract

Diffusion-weighted magnetic resonance imaging (dMRI) is used to study white matter microstructure and to delineate pathways by estimating fiber orientation distributions (FODs). Symmetric FODs represent the conventional model assuming antipodal symmetry in water diffusion. However, in complex regions with bending, branching or fanning fibers, this assumption is not guaranteed. To better capture such underlying fibers geometries, asymmetric FODs (A-FODs), derived from neighboring FODs, have been introduced. Here, we propose an Encoder-based Curvature-Aware Regularization (EnCAR) method for estimating A-FODs. Incorporating curvature features into the regularization weight applied to neighboring voxels improves reconstruction of A-FODs. A self-supervised Transformer network, combined with a Spherical Harmonics Semantic Encoder, learns region-specific regularization parameters from this local neighborhood to capture the diversity of fiber geometries across the brain. The EnCAR method was verified on the DiSCo challenge phantom, and applied to in vivo multi-shell Human data. The model estimated sharp, high-angular-resolution A-FODs that were well aligned with local fiber pathway. Compared with established FOD and A-FOD methods, it performed on par in regions dominated by symmetric FODs and outperformed them in complex asymmetric regions. Quantitative evaluation using the Asymmetry Index (ASI) and Model Discrepancy Index (MDI) confirmed improved consistency with the underlying diffusion signals. By ensuring smooth directional transitions, this work enhances the visibility of continuous fiber segments.

8

Fully Homomorphic Collaborative Learning for Safe Cross-Healthcare Institution Development and Implementation of Foundation Models

Bian, S.; Qiao, H.; Yan, T.; Xia, Z.; Gao, X.; Xu, Y.; Shen, R.; Ma, T.; Guan, Z.; Wang, Y. X.; Wong, T. Y.; Dai, Q.

2026-05-20 ophthalmology 10.64898/2026.05.15.26353345 medRxiv

Top 0.1%

22.6%

Show abstract

Foundation models (FMs) are powerful tools to allow the broad clinical application of artificial intelligence (AI) in healthcare systems, offering adaptability to different disease, modalities and clinical settings. However, FMs require large-scale datasets to train and fine-tune, while most real-world data are localized in siloed healthcare settings with strict data privacy protection, a restriction that poses a fundamental challenge in the cross-healthcare institution development of FMs. Here, we develop a fully homomorphic collaborative learning framework, named as FOCAL, that enables secure FM-driven diagnosis without exposing raw patient information. Different from traditional federated learning (FL) frameworks that aggregate locally trained models, FOCAL integrates fully homomorphic encryption (FHE) with split training to effectively execute collaborative learning completely over encrypted data. Specifically, we apply FOCAL on different types of retinal and pathology FMs to demonstrate its clinical performance. When facing gradient inversion attacks, FOCAL reduced the data leakage rate from 90.6% to 0% with comparable accuracy performance of the state-of-the-art FL paradigms, owing to the provable security provided by FHE. Moreover, under the same level of security, FOCAL can boost the macro-average AUROC by nearly 50% (from 0.5202 to 0.9831) when evaluated against fully encrypted FL models. In the multi-institution comparative experiments, FOCAL consistently outperforms all single-institution FMs, improving AUROCs by 9.62% and 14.46% on the ocular disease diagnosis and severity classification, respectively. Lastly, external validations on both retinal and pathology FMs further verified the accuracy and security advantages of FOCAL and highlighted its reliable interpretability and generalizability for cross-institution clinical development and implementation of FMs. FOCAL is a novel method to build a secure data-sharing AI community, facilitating healthcare institutions to benefit from and contribute to next-generation FMs development without compromising patient privacy and data security.

9

Modality Fusion of MRI and Clinical Data for Glioma Tumour Grading

Kheirbakhsh, R.; Mathur, P.; Lawlor, A.

2026-04-22 health informatics 10.64898/2026.04.20.26351308 medRxiv

Top 0.1%

22.4%

Show abstract

Multimodal machine learning leverages complementary information from diverse data sources and has shown strong promise in medical imaging, where multimodal data is critical for clinical decision making. In glioma grading, integrating MRI modalities with clinical data can improve diagnostic accuracy, yet systematic comparisons of fusion strategies remain limited. This study evaluates early, intermediate, and late fusion approaches, addressing the question: How does the inclusion of clinical data alongside MRI modalities influence grading performance? To assess modality contributions, we design adaptable fusion layers and employ interpretability techniques, including attention-based analysis. Our results show that incorporating clinical data consistently outperforms unimodal and MRI-only baselines, with intermediate fusion yielding the most reliable gains. Beyond accuracy, the framework reveals how MRI and clinical features jointly shape predictions, underscoring the importance of both fusion design and interpretability for clinical adoption.

10

CREB: Consistent Reference External Batch Harmonization

Kharade, A.; PAN, Y.; Andreescu, C.; Karim, H. T.

2026-03-12 bioengineering 10.64898/2026.03.10.710874 medRxiv

Top 0.1%

19.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMachine learning models using functional magnetic resonance imaging (fMRI) are becoming increasingly popular - these models often rely on training data from multiple, large, and publicly available datasets. It is often necessary to harmonize these data across sites and sequences, and algorithms like ComBat are frequently applied to correct for these differences. This has been shown to improve model performance and generalizability. However, applying traditional ComBat necessitates harmonizing all data (train, validation, test, and other unseen external test sets) simultaneously, which leads to potential data leakage and limits application to new unseen data. We introduce Consistent Reference External Batch (CREB) harmonization, a novel extension of ComBat that learns the prior distribution of site effects exclusively from a designated training set. This learned prior serves as a consistent, easily deployable reference point that employs the empirical Bayes framework to update the site effect for any new, external unseen data. This approach enables training, validation, and test sets to be harmonized separately, thereby preventing data leakage, ensuring the integrity of downstream analyses, and application to new unseen data. CREB is different from traditional ComBat in which each sites prior distribution is estimated at once, but this cannot be applied to unseen data or data from sites not included in the original set of data. We tested CREB with train data from 2846 participants (ages 18-97 years) across 9 different studies and test data from 1113 participants (ages 18-88 years) from 3 studies. We evaluated the performance of harmonization with functional connectivity and gray matter volume. We show that CREB can effectively harmonize the test data to the train data, and have comparable performance to ComBat. CREB is able to conduct this harmonization in a two-step procedure that prevents leakage and is deployable to new unseen data. Finally, we tested whether CREB could similarly preserve biological variance (e.g., whether age associations were preserved after harmonization). We found that CREB, like ComBat could preserve age associations with both functional connectivity and gray matter volume measures. CREB provides an easily deployable, robust harmonization method to standardize data to a common reference distribution, making it uniquely suitable for training generalizable machine learning models.

11

Shortkit-ML: A Unified Multi-Perspective Framework for Detecting Shortcut Learning in Medical Imaging Embeddings

Cajas, S.; Marzullo, A.; Kapadia, S.; Santos, F.; Ocampo Osorio, F.; Kong, Q.; Quarta, A.; Kuo, P.-C.; Patel, M.; Rojas Sillery, R. I.; Celi, L. A.

2026-04-30 health informatics 10.64898/2026.04.29.26352053 medRxiv

Top 0.1%

18.9%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWShortcut learning poses a significant challenge in clinical artificial intelligence, as models may rely on spurious signals rather than clinically relevant features, leading to biased predictions and poor generalization. Existing detection methods are fragmented and lack systematic evaluation across datasets and model architectures. To address this issue, we propose ShortKit-ML, an open-source Python framework for unified shortcut analysis in embedding spaces. The framework integrates over 20 detection methods and six mitigation strategies within a modular pipeline, encompassing embedding analysis, fairness metrics, training dynamics, causal methods, explainability, and representation analysis. We evaluate the framework on chest X-ray datasets (CheXpert and MIMIC-CXR), synthetic benchmarks, and an out-of-domain dataset (CelebA). Experimental results demonstrate that multi-method auditing provides more stable and interpretable evidence than individual methods, while detector disagreement reveals meaningful representational differences. The proposed framework offers automated reporting, interactive visualization, and is available as a pip-installable package. The source code and documentation are publicly available at https://github.com/criticaldata/ShortKit-ML and https://criticaldata.github.io/ShortKit-ML/.

12

FiberLM: A Transformer-Based Model for Mouse Brain Diffusion MRI Tractography Guided by Viral Tracer Data

Wen, R.; Zhang, J.; Liang, Z.

2026-05-11 neuroscience 10.64898/2026.05.06.723316 medRxiv

Top 0.1%

18.9%

Show abstract

Diffusion MRI (dMRI) tractography provides a non-invasive method for mapping whole-brain structural connectivity. However, its application is limited by substantial false-positive and false-negative connections. While deep learning based methods have shown promise in improving tractography, most rely on training data derived from conventional dMRI tractography, therefore inheriting the same limitations. Here, we introduce FiberLM, an attention-based Transformer model for mouse brain tractography. The model was trained using a whole-brain streamline dataset based on viral tracer data from the Allen Mouse Brain Connectivity Atlas (AMBCA), allowing the model to learn the properties of both local and long-range axonal trajectories through self-attention. FiberLM was applied to predict anatomically plausible axonal trajectories from ex vivo high-resolution mouse brain dMRI data. Quantitative evaluations demonstrated that FiberLM significantly reduced false-positive and false-negative connections, improved spatial agreement with tracer-defined pathways, and generated whole-brain connectomes that more closely approximated AMBCA results compared to conventional tractography. These findings suggest FiberLM as a potential tool for accurate reconstruction of mouse brain structural connectomics.

13

Interpretable Hierarchical RNNs for rs-fMRI: Promise and Limits of Individualized Brain Dynamics

Barkhau, C. B. C.; Mahjoory, K.; Brenner, M.; Weber, E.; Leenings, R.; Pellengahr, C.; Winter, N. R.; Konowski, M.; Straeten, T.; Meinert, S.; Leehr, E. J.; Flinkenfluegel, K.; Borgers, T.; Grotegerd, D.; Meinert, H.; Hubbert, J.; Jurishka, C.; Krieger, J.; Ringels, W.; Stein, F.; Thomas-Odenthal, F.; Usemann, P.; Teutenberg, L.; Nenadic, I.; Straube, B.; Alexander, N.; Jansen, A.; Jamalabadi, H.; Kircher, T.; Junghoefer, M.; Dannlowski, U.; Hahn, T.

2026-03-23 neuroscience 10.64898/2026.03.20.713153 medRxiv

Top 0.1%

18.8%

Show abstract

Modeling individual brain dynamics from resting-state fMRI (rs-fMRI) remains challenging due to substantial inter-subject variability, measurement noise, and limited data length per subject. Here, we systematically evaluate a hierarchical dynamical systems framework based on shallow piecewise-linear recurrent neural networks (shPLRNNs) for individualized modeling of rs-fMRI data, with a particular focus on reproducing subject-specific functional connectivity (FC). We applied the framework to 1,423 rs-fMRI samples from healthy participants of the Marburg-Munster Affective Disorders Cohort Study (MACS). Simulated rs-fMRI data robustly reproduced empirical FC patterns, with comparable reconstruction accuracy on training and independent validation sets. Generalization to unseen individuals was heterogeneous and strongly depended on how typical a subjects connectivity pattern was relative to the training cohort, with template similarity explaining 37% of variance in reconstruction accuracy. Learned subject-specific parameters exhibited significant test-retest stability and higher within-subject than between-subject similarity on longitudinal data from two different timepoints, supporting their interpretation as individualized dynamical markers. Associations between individual parameters and demographic or cognitive variables were statistically significant but modest in effect size, and predictive performance remained below that obtained using empirical rs-fMRI features directly. Together, these results demonstrate that hierarchical shPLRNNs can extract meaningful and stable individual-specific dynamical structure from rs-fMRI data, while highlighting current limitations in capturing fine-grained individual differences. The findings delineate key trade-offs between model expressivity, generalization and subject specificity, and point to directions for future methodological refinement in individualized brain modeling.

14

Structurally Restricted Message-Passing within Shallow Architectures for Explainable Network-Level Brain Decoding on Small Cohorts

Marques dos Santos, J. D.; Ramos, M. B.; Reis, L. P.; Marques dos Santos, J. P.; Direito, B.

2026-03-23 bioinformatics 10.64898/2026.03.20.713136 medRxiv

Top 0.1%

18.8%

Show abstract

The application of artificial intelligence (AI) to functional magnetic resonance imaging (fMRI) has gained increasing attention due to its ability to model complex, high-dimensional brain data and capture nonlinear patterns of neural activity. However, deep learning architectures, such as Graph Neural Networks (GNNs), typically require large sample sizes to achieve stable convergence, limiting their applicability in neuroimaging contexts where data are often scarce. This challenge highlights the need for compact, data-efficient models that maintain predictive performance and interpretability. Shallow neural networks (SNNs) have demonstrated robustness in low-sample settings but commonly rely on region-level features that treat brain areas independently, overlooking the brains intrinsically network-based organization. To address this limitation, we propose a structurally constrained message-passing framework that integrates diffusion tensor imaging (DTI)-derived structural connectivity with region-level fMRI signals within a shallow architecture. This approach enables network-level modeling while preserving the stability and data efficiency of SNNs. The method is evaluated on 30 subjects performing a Theory of Mind (ToM) task from the Human Connectome Project Young Adult dataset. A baseline SNN achieved global accuracies of 88.2% (fully connected), 80.0% (pruned), and 84.7% (retrained), while the proposed model achieved 87.1%, 77.6%, and 84.7%, respectively. Although structural constraints led to a more pronounced performance decrease after pruning, retraining restored accuracy to baseline levels, demonstrating that biological constraints can be incorporated without compromising predictive validity. Model interpretability was assessed using SHAP (Shapley Additive Explanations). While the baseline model primarily identified isolated regions as key contributors, the proposed framework revealed distributed, structurally coherent networks as the main drivers of classification. These networks showed correspondence with established ToM regions, including the temporo-parietal junction, superior temporal sulcus, and inferior frontal gyrus. Importantly, the findings suggest that groups of moderately informative regions can collectively form highly relevant subnetworks. Overall, the proposed framework achieves competitive performance in a limited dataset while incorporating graph-inspired message passing into a shallow architecture. Its explainability provides insight into how structurally constrained networks support stimulus-driven responses in ToM and demonstrates potential for investigating network dysfunction in disorders such as Alzheimers disease, ADHD, autism spectrum disorder, bipolar disorder, mild cognitive impairment, and schizophrenia.

15

Auxiliary Clinical Prompt Integration into Vision-Language Prompt SAM for Brain Tumor Segmentation

Hakata, Y.; Oikawa, M.; Fujisawa, S.

2026-04-17 health informatics 10.64898/2026.04.15.26351001 medRxiv

Top 0.1%

18.8%

Show abstract

BackgroundAdult diffuse glioma is a representative class of primary brain tumors for which accurate MRI-based tumor segmentation is indispensable for treatment planning. Conventional automated segmentation methods have relied primarily on image information and spatial prompts, and auxiliary clinical information that is routinely acquired in clinical practice has not been sufficiently exploited as an input. ObjectiveBuilding on a dual-prompt-driven Segment Anything Model (SAM) extension framework [20] that fuses visual and language reference prompts, we propose a method that integrates patient demographics, unsupervised molecular cluster variables derived from TCGA high-throughput profiling, and histopathological parameters as learnable prompt embeddings, and we evaluate its effect on the accuracy of lower-grade glioma (LGG) MRI segmentation. MethodsAn auxiliary prompt encoder converts clinical metadata into high-dimensional embeddings that are fused with the prompt representations of Segment Anything Model (SAM) ViT-B through a cross-attention fusion mechanism. The TCGA-LGG MRI Segmentation dataset (Kaggle release by Buda et al. [24]; n = 110 patients; WHO grade II-III) was split at the patient level (train/val/test = 71/17/22) using three different random seeds, and the three slices with the largest tumor area were extracted from each patient. To avoid pseudo-replication arising from multiple slices per patient and repeated measurements across seeds, our primary analysis aggregated Dice and 95th-percentile Hausdorff distance (HD95) to the patient x seed unit (n = 66); secondary analyses at the unique-patient level (n = 22) and at the per-slice level (n = 198) are also reported. Pairwise comparisons used paired t-tests with Bonferroni correction (k = 3) and Wilcoxon signed-rank tests, and a permutation test (K = 30) served as an auxiliary check of effective use of the auxiliary information. ResultsAt the patient x seed level (n = 66), Proposed (full clinical) achieved a Dice gain of {Delta} = +0.287 over the zero-shot SAM ViT-B baseline (paired-t p = 4.2 x 10-{superscript 1}, Cohens d_z = +1.25, Bonferroni-corrected p << 0.001; Wilcoxon p = 2.0 x 10-{superscript 1}), and HD95 improved from 218.2 to 64.6. Because zero-shot SAM is not designed for domain-specific medical segmentation, the large absolute HD95 gap largely reflects the expected domain gap rather than a competitive baseline. The additional contribution of the full clinical configuration over the demographics-only configuration was {Delta} Dice = +0.023 (paired-t p = 0.057, Bonferroni-corrected p = 0.172), which did not reach statistical significance at the patient level and is reported as a directional trend. The permutation test (K = 30, seed 2025) yielded real-metadata Dice = 0.819 versus a shuffled-metadata mean of 0.773, giving an empirical p = 0.032 = 1/(K + 1), which is at the resolution limit of this test and should therefore be interpreted as preliminary evidence. ConclusionsIntegrating auxiliary clinical information as multimodal prompts produced a large improvement over the zero-shot SAM baseline on this LGG cohort. More importantly, a robustness analysis showed that Proposed (full clinical) outperformed the trained Base (no auxiliary information) under all tested spatial-prompt conditions, including perfect centroid ({Delta} = +0.014), and that the advantage was most pronounced in the prompt-free regime ({Delta} = +0.231, p = 0.039), where the base model collapsed but the proposed model maintained meaningful segmentation by leveraging clinical metadata alone. The additional contribution of molecular and histopathological information beyond demographics was not statistically resolved at the patient level ({Delta} = +0.023, n.s.). Establishing clinical utility will require external validation on larger multi-center cohorts and direct comparisons with established segmentation methods.

16

Bi-cross-validation: a data-driven method to evaluate dynamic functional connectivity models in fMRI

Wei, Y.; Smith, S. M.; Gohil, C.; Huang, R.; Griffin, B.; Cho, S.; Adaszewski, S.; Fraessle, S.; Woolrich, M. W.; Farahibozorg, S.-R.

2026-04-06 neuroscience 10.64898/2026.04.02.716067 medRxiv

Top 0.1%

18.7%

Show abstract

Dynamic functional connectivity (dFC) models have become increasingly popular over the past decade for characterising time-varying interactions between brain regions. However, assessing and comparing dFC models remains challenging. Here, we introduce bi-cross-validation as a general framework for evaluating dFC models and selecting key hyperparameters, such as the number of states. By jointly partitioning the data across subjects and brain regions, bi-cross-validation enables out-of-sample evaluation without re-estimating latent states on the same data used for testing, thereby avoiding circularity. Using simulated data with known ground-truth dynamics, we show that bi-cross-validation favours models that accurately capture the underlying state structure. Applying the framework to real resting-state fMRI data, we demonstrate that bi-cross-validation naturally balances goodness-of-fit against model complexity, with performance improving and then declining as model complexity increases. Finally, we use bi-cross-validation to directly compare static and dynamic FC models, showing that dynamic models underperform static models at low spatial dimensionality, but outperform static models at sufficiently high dimensionality. Together, these results establish bi-cross-validation as a principled tool for dFC model selection, evaluation, and comparison.

17

PINN-ing the Balloon: A Physically Informed Neural Network Modelling the Nonlinear Haemodynamic Response Function in MRI

Avaria-Saldias, R. H.; Ortiz, D.; Palma-Espinosa, J.; Cancino, A.; Cox, P.; Salas, R.; Chabert, S.

2026-04-07 neuroscience 10.64898/2026.04.04.716499 medRxiv

Top 0.1%

18.3%

Show abstract

Accurate characterisation of the haemodynamic response function (HRF) is central to interpreting blood-oxygen-level-dependent (BOLD) signals in functional magnetic resonance imaging, yet standard estimation approaches remain centred around phenomenological formulations lacking biophysical grounding. We present a physics-informed neural network (PINN) framework that bridges these paradigms by embedding the Balloon-Windkessel model directly into the training objective of a multi-headed Neural Network. Our aproach simultaneously estimates probable latent neurovascular state variables such as cerebral blood inflow, metabolic rate of oxygen consumption, blood volume, and deoxyhaemoglobin content, through an indirect optimisation scheme in which the predicted BOLD signal is obtained via convolution of the estimated HRF with experimental stimuli. Training is governed by a composite loss, balancing differential-equation residuals, physiological initial conditions and data fidelity. In simulations with temporal signal-to-noise ratios representative of clinical acquisitions, the framework recovered ground-truth state variables with coefficients of determination exceeding 0.99 and mean squared errors below 10-3, at a physics-to-data weighting of 0.40:0.60. Application to 1.5 T block-design fMRI data from an ischaemic stroke patient yielded physiologically plausible, subject-specific HRF estimates, establishing feasibility of single-subject, physics-constrained HRF inference without reliance on fixed gamma basis assumptions.To our knowledge, this constitutes the first deployment of a single PINN incorporating the full Balloon-Windkessel model within an indirect training objective, reconstructing full BOLD observations, positioning PINN-based haemodynamic modelling as a principled and personalised route towards more interpretable and patient-specific fMRI biomarkers.

18

Regression vs. Medical LLMs: A Comprehensive Study for CVD and Mortality Risk Prediction

KOM SANDE, S. D.; Skorski, M.; Theobald, M.; Schneider, J.; Marz, W.

2026-03-11 health informatics 10.64898/2026.03.11.26347789 medRxiv

Top 0.1%

17.4%

Show abstract

Cardiovascular diseases (CVDs) remain the foremost cause of global morbidity and mortality, driving an urgent need for robust predictive tools that enable early detection and preventive intervention. Traditional regression-based models--such as linear and logistic regression, regression trees and forests, and Support Vector Machines (SVMs)--have long underpinned CVD risk estimation but often assume linear relationships, homogeneous effects across populations, and a limited number of predictors. Recent advances in regression, such as bagging and boosting, as well as Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are increasingly shifting this paradigm. In this paper, we review key developments in the context of both classic regression techniques and recent GenAI approaches, and we put a particular focus on openly available Medical LLMs (MedLLMs) in combination with few-shot prompting and classification finetuning. Based on the LURIC cardiovascular health study, we investigate a broad variety of biomarkers and risk factors under two different cohorts of 3,316 CVD risk patients who underwent coronary angiography in Germany between 1997 and 2000. Our results demonstrate that large, pretrained MedLLMs (70B) achieve up to 82% AUROC for 1-year all-cause mortality (1YM) prediction with optimized few-shot prompting, thus performing competitively with recent regression techniques and state-of-the-art methods from the medical literature such as CoroPredict, SMART and SCORE2. Smaller models (8B) can be finetuned to match or even surpass their larger counterparts as well as commercial models like ClaudeSonnet-4.5 and ChatGPT-5.2. Among all evaluated approaches, the best-performing boosting-based regression technique (CatBoost) and commercial LLM (Gemini-3-Flash) both achieve an AUROC of up to 85%. Further model-calibration and -stratification analyses reveal a systematic mortality over-prediction (ECE: 0.05-0.10) of MedLLMs, while Platt scaling effectively reduces such miscalibrations by 60-90%.

19

Individualized Per-Site Meta-Federated Feature Learning (iPS-MFFL) for Privacy-Preserving Brain Tumor MRI Classification under non-IID Heterogeneity

Hakata, Y.; Oikawa, M.; Fujisawa, S.

2026-04-17 health informatics 10.64898/2026.04.15.26351000 medRxiv

Top 0.1%

17.4%

Show abstract

BackgroundFederated learning (FL) enables collaborative model training across institutions without sharing patient-level data. However, standard FL algorithms such as FedAvg degrade under non-independently and non-identically distributed (non-IID) data, a prevalent condition when patient demographics, scanner hardware, and disease prevalence differ across hospital sites. ObjectiveWe propose iPS-MFFL (Individualized Per-Site Meta-Federated Feature Learning), a federated framework with a hierarchical local-model architecture that addresses non-IID heterogeneity through (1) a shared feature extractor, (2) multiple weak-learner classification heads that can be trained with heterogeneous training objectives to promote complementary decision boundaries, (3) independent per-learner server aggregation so that each weak learners parameters are averaged only with its counterparts at other clients, and (4) a lightweight meta-model -- itself federated -- that adaptively stacks the weak-learner outputs. The specific choices of backbone, weak-learner training objectives, and meta-model are implementation details; in this work we use an ImageNet-pretrained ResNet18 and three heterogeneous losses as a concrete instantiation. MethodsWe evaluate on the Brain Tumor MRI Classification dataset (7,200 images; 4 classes: glioma, meningioma, pituitary tumor, no tumor) partitioned across K = 5 simulated hospital sites using Dirichlet non-IID sampling ( = 0.3). Four baselines are compared: Local-only training, FedAvg, FedProx, and Freeze-FT. All experiments are repeated over three random seeds (13, 42, 2025) and evaluated using paired t-tests, Cohens d effect sizes, and post-hoc power analysis. ResultsiPS-MFFL achieved the highest mean final-round test accuracy point estimate of 85.42 {+/-} 8.70% (mean {+/-} SD across three seeds), compared to FedAvg (78.48 {+/-} 12.66%), FedProx (78.33 {+/-} 14.64%), Freeze-FT (73.98 {+/-} 21.09%), and Local (58.10 {+/-} 11.77%). iPS-MFFL showed the smallest cross-seed SD, suggesting greater robustness to partition heterogeneity. However, one-way ANOVA did not reach statistical significance (F = 1.52, p = 0.270), reflecting the limited number of experimental seeds. Cohens d effect sizes relative to iPS-MFFL ranged from 0.59 (vs. FedProx) to 2.64 (vs. Local); post-hoc pairwise comparisons are reported as exploratory given the non-significant omnibus test. Post-hoc power analysis indicated that statistical power for FL baseline comparisons was only 0.10-0.12 for the observed effect sizes (d {approx} 0.6) at n = 3 seeds. ConclusionsiPS-MFFL provides a practical approach to heterogeneous federated brain tumor classification by combining transfer learning, contrastive weak-learner diversity, and meta-learning. The framework demonstrated the highest mean accuracy and lowest variance across diverse data partitions. Validation with larger seed pools ([≥] 10 seeds for 80% power), ablation studies, and external multi-center cohorts is needed to establish generality.

20

GlioVision: A Multi-Modal MRI Framework for Non-Invasive Glioma Molecular Biomarkers Prediction

Nazir, A.; Cheema, M. N.; Hsu, Y.-C.; Jiang, X.; Zhu, J.-J.; Harmanci, A. S.; Harmanci, A.

2026-04-26 neuroscience 10.64898/2026.04.22.719993 medRxiv

Top 0.1%

17.3%

Show abstract

Gliomas are aggressive primary brain tumors that necessitate critical molecular biomarker predictions for optimal clinical decision-making. Traditional assessment relies on surgical tumor specimens analysis, which carries procedural risks and sampling bias due to tumor heterogeneity. Existing deep learning methods for non-invasive prediction lack real-time applicability, remain resource-intensive, and are frequently trained on narrowly represented datasets. We present GlioVision, a framework built on the MONAI library to process multimodal data, including glioma MRI and molecular labels, to predict and identify, non-invasively, four major glioma molecular biomarkers: IDH mutation, 1p/19q co-deletion, MGMT methylation, and WHO grade. The core architecture comprises Spatially and Channel-wise Recalibrated 3D DenseNet (SCRU-DenseNet), which utilizes a computational attention gate and an Adaptive Contrast-Specific Processing Stream (ACPS) to tackle multi-site, heterogeneous datasets. We introduced the Confidence-Filtered Predictive Manifold (CFPM) to manage uncertainty by excluding predictions with low confidence. GlioVision is trained and validated on the largest multi-cohort datasets, achieving strong biomarker prediction with AUCs of (IDH 0.94, 1p/19q 0.87, MGMT 0.86, WHO grades 0.92), supporting molecularly defined glioma diagnosis under the WHO 2021 classification guidelines. Finally, we provide a Differential Training Integrity Assessment (DTI-A) to analyze routes of MRI data privacy protections through model obfuscation. Our results advance the codebase, model release, and leakage considerations around MRI data analysis literature.